GERMAN TRAFFIC SIGNAL RECOGNITION!
Group 13
Professor: Ran Feldesh

Image

Authors

  • Sidhartha S Mondal
  • Bharat Sharma
  • Prithu Bradhwaj
  • Chirag Bhatia

Table of Contents¶

  • Abstract
  • Introduction
  • Related Work
  • Data
  • Methods
  • Experiments
  • Conclusion
  • References
  • streamlit
  • Next Steps

Abstract¶

Problem:

Reliable traffic sign recognition is critical for safe navigation of autonomous vehicles. Signs need to be accurately classified and detected in diverse real-world conditions.

Approach:

This project develops custom CNN and YOLO models for multi-class traffic sign classification and detection using the GTSRB dataset. The CNN model categorizes cropped sign images. YOLOv5 and YOLOv8 detect signs in full road scenes. Models are optimized via transfer learning, augmentation, and regularization. We also tried a different approach to merge the dataset into single images for faster training and higher accuracy

Results:

Rigorous evaluation on test set images shows strong performance, with precision and recall exceeding 90% for classification and high mean average precision for detection. No overfitting occurred during training. The optimized models can reliably recognize traffic signs in challenging real-world conditions. Integration into self-driving car perception systems could enable safe automated navigation.

Introduction¶

Reliable recognition and detection of traffic signs is a crucial capability for the safe navigation of autonomous vehicles. Traffic signs play a vital role in regulating traffic, warning of potential hazards, guiding navigation, and ensuring overall road safety. However, accurately classifying and localizing signs in diverse real-world conditions presents several challenges. These challenges include signs being obscured by obstacles, located at a significant distance from the vehicle, warped due to perspective, or under poor lighting conditions.

To tackle these issues, this project focuses on the development of deep learning models. Specifically, custom convolutional neural networks (CNNs) and YOLO (You Only Look Once) architectures are implemented to address multi-class traffic sign classification and detection. CNNs are well-suited for image recognition tasks, while YOLO is an object detection system that can efficiently identify objects, including traffic signs, in an image.

The models are trained and optimized on a large benchmark dataset containing real-world images of German traffic signs captured under various driving scenarios. This diverse dataset ensures that the models can handle a wide range of traffic sign variations and conditions they may encounter on the road.

The evaluation results demonstrate the robust performance of the developed models. The classification model achieves over 90% precision and recall, indicating its high accuracy in correctly identifying the traffic signs' classes. On the other hand, the detection model achieves a high mean average precision, which shows its effectiveness in localizing and identifying multiple traffic signs in a single image.

One key aspect of the project's success is that the models generalize well to unseen data during testing. Generalization means that the models perform well on new, unseen traffic sign images, without overfitting to the training data. Overfitting occurs when a model memorizes the training data and fails to perform well on new, unseen data. The absence of overfitting is crucial as it ensures the models can handle various real-world traffic sign scenarios and not just the specific ones present in the training dataset.

In conclusion, this project's effective traffic sign recognition solution based on deep learning represents a significant advancement in enabling safer and smoother autonomous driving. By integrating these optimized models into the perception systems of self-driving cars, the vehicles can reliably detect and classify traffic signs, even in challenging real-world conditions. This reliable traffic sign recognition capability will be a critical enabler for the widespread adoption of fully automated vehicles, helping to enhance road safety and improve overall transportation efficiency.

Related Work¶

Traffic sign recognition has been extensively researched in the past, and the emergence of deep learning has become the leading technique in this domain. Previous studies primarily focused on classifying cropped traffic sign images using convolutional neural networks (CNNs). Various custom CNN architectures, ResNet, and other models have been employed, and benchmark results on datasets like GTSRB (German Traffic Sign Recognition Benchmark) have been reported [1-3].

However, the exploration of object detection on full road scene images for traffic signs has been limited. Some researchers have applied region-based CNNs like Faster R-CNN for this task [4,5]. Notably, YOLO models, which offer a fast single-stage approach for object detection, have not been extensively leveraged in this context until now.

This project aims to address both traffic sign classification and detection, with a particular focus on using YOLO for the detection task. Specifically, YOLOv3 and YOLOv5, which are state-of-the-art detection models known for their real-time performance, have been implemented and evaluated. While most previous studies on the GTSRB dataset mainly addressed classification, this project extends the research to include detection capabilities.

Additionally, a novel data processing approach has been introduced in this work, where images are merged into batches to accelerate the training process. This batch processing method offers more efficient training compared to the standard image-by-image approaches commonly used.

In summary, while CNNs have been widely employed for traffic sign classification, this project introduces the use of recent YOLO variants for detection on the GTSRB dataset, bringing new capabilities to the field. Furthermore, the novel batch processing method enhances training efficiency, making the entire system more effective and practical for real-world applications. This project not only provides a strong baseline for traffic sign classification but also introduces fast and accurate detection models that can be crucial for reliable traffic sign recognition in autonomous driving and transportation systems.

Links: https://www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign - No yolo based models have been tried out in here

Data¶

Data Description:

The Data came in 2 files - Test and Train images, it came with two csv file train.csv file and test.csv file. Each file consited of the file path, the particular class of that particular file following is a sample table from the data
Width Height Roi.X1 Roi.Y1 Roi.X2 Roi.Y2 ClassId Path
30 29 6 5 25 24 0 Train/0/00000_00001_00002.png
65 65 6 6 60 60 1 Train/1/00001_00045_00020.png
33 33 5 6 27 28 2 Train/2/00002_00051_00005.png

Data type:

The Data Consisted of Images with bounding boxes data. following are few images from the original data source Image

Data Source:

The data we used was available in Kaggle - also a dedicated website where it was first published in 2011
Kaggle Link : https://www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign
Data origin : https://benchmark.ini.rub.de/

Data Size:

How much data are we are working with :
Train: 39210 Images - 43 Class
Test: 12631 Images memory: 800 MB

Pre-Processing

For the custom CNN classification model, several preprocessing steps were applied to the GTSRB dataset:
Data augmentation
- Images were augmented by applying random rotations, shifts, zooms, and color changes. This expands the number of training examples to improve model generalization.
Sharpening
- A sharpening filter layer was added to the CNN model to enhance edge contrast in the traffic sign images. This helps the model better recognize shape features.
Normalization
- Pixel values were normalized to speed up training and optimize convergence.

For the YOLO detection models, the dataset required different preprocessing:

Annotation conversion
- Bounding box coordinates and class labels were converted to the format required for YOLO model training.
Batching
- To enable more training epochs, multiple images were merged together into batches by overlaying them on a single canvas with random offsets. This allows the model to infer on multiple examples per pass.
Resizing
- Images were resized to standard dimensions suitable for input into the YOLO models.
In summary, specialized preprocessing was required to tailor the GTSRB dataset for both traffic sign classification using CNNs as well as detection using the YOLO models. The techniques improved model accuracy and training efficiency.

Methods¶

We tried 3 different methods - a custom CNN model, Yolov5 and yolov8

In CNN model we tried with a basic cnn architecture which produced extemely bad results, so we tried and explored that data augmentation and a sharpen layer is required to have better results following is the results table after doing this experiment This is the base line model we tried.

Metric Value
Test Precision 0.84
Test Recall 0.84
Test f1-score 0.83
Test Mean squared error 2.63

we will discuss these methods into more depth in the Experiments section

After initially experimenting with a CNN model for traffic sign recognition, the decision was made to explore YOLO (You Only Look Once) as an alternative approach. The rationale behind trying out YOLO stems from the fact that the dataset being used for this project is relatively old, and many conventional methods have already been implemented and studied extensively. As a result, the team sought to leverage YOLO, which is considered one of the newest and cutting-edge technologies in the field of computer vision and object detection.

YOLO has gained significant attention and acclaim due to its ability to provide real-time object detection with impressive accuracy. Unlike traditional region-based methods that require multiple stages to detect objects, YOLO takes a single-stage approach, making it faster and more efficient. This makes YOLO particularly suitable for applications that demand real-time processing, such as autonomous vehicles that need to react quickly to changing traffic conditions.

By evaluating YOLO on this dataset, the team aimed to investigate how well this advanced technique performs in the context of traffic sign recognition. Since YOLO is known for its speed and accuracy, it presented an exciting opportunity to potentially surpass the performance of previous methods, even on an older dataset.

Furthermore, experimenting with YOLO allows the team to contribute to the understanding of how this cutting-edge technology adapts to different datasets and scenarios. The findings from this investigation can shed light on the generalization capabilities of YOLO and its potential applicability to real-world autonomous driving scenarios, where quick and reliable traffic sign detection is crucial for safe navigation.

In summary, the decision to try out YOLO for traffic sign recognition was driven by the desire to explore the latest advancements in the field and assess its performance on a dataset where traditional methods have already been explored. By doing so, the team aimed to assess the effectiveness of YOLO and its potential as a powerful tool for real-time traffic sign recognition in autonomous driving systems.

Following is the output from the Yolo model Image

Experiments¶

CNN¶

Code is available here: https://github.com/sidharthamondal/adv_maths_final_grp13/tree/main/models/Base%20CNN

A CNN based model for object detection and bounding box regression is a specialized architecture designed to simultaneously identify objects in images and accurately predict their bounding box coordinates. This type of model is particularly suited for tasks where detecting and localizing multiple objects within an image are essential. Detailed code URL:

PreProcessing: We tried multiple different approach but since the results were coming very bad we have not added in here .

Loading Data¶

The code first loads the pandas dataframe containing the bounding box coordinates and paths to the images.

Preprocessing Dataframe¶

The dataframe contains the original width and height of images. Since we are resizing all images to 30x30, the code updates the bounding box coordinates accordingly if the original image dimensions exceed 30x30.

Extracting Labels¶

The code extracts classification labels for each image by getting the parent directory name of the image path using pathlib.

Two dictionaries are created to map the string class names to integer labels.

Loading Datasets¶

The images paths are split into train and validation sets.

Creating TF Data Generators¶

Two TF data generators are created for feeding train and validation data batches to the model.

The parse function:

  • Reads in the image
  • Applies brightness/contrast adjustments if the image seems dark
  • Resizes the image to 30x30
  • Normalizes pixel values to the 0-1 range
  • Creates one-hot encoded labels

The generators shuffle the data, apply the parse function, and batch the data.

This covers the key data preprocessing steps done in this code. Let me know if you need any clarification or have additional questions!

CNN Model Architecture¶

Input Layer¶

The input layer takes 30x30x3 RGB images.

Sharpen Layer¶

A custom sharpen layer is defined to sharpen the image edges and enhance features. It performs a 3x3 convolution.

Convolutional Layer 1¶

  • 32 filters of size 5x5
  • ReLU activation function
  • He normal initializer
  • L2 regularization

Max Pooling Layer 1¶

2x2 max pooling to reduce dimensions

Dropout Layer 1¶

Dropout with rate 0.25

Flatten Layer¶

Flattens the output to 1D

Dense Layer 1¶

  • 256 unit fully connected layer
  • ReLU activation function
  • He normal initializer
  • L2 regularization

Dropout Layer 2¶

Dropout with rate 0.5

Classification Output¶

43 unit output layer with softmax activation for 43 class classification

Regression Output¶

4 unit output layer with linear activation for bounding box regression

The model has 1.4 million parameters.

It uses a typical convolutional neural network architecture with convolutional, pooling, and dense layers. The custom sharpen layer helps enhance features. Dropout is used to regularize the model.

The model has two output heads - one for classification and one for bounding box regression. This allows the model to jointly perform classification and localize the traffic sign in the image.

Following is the model architecture derived from this code

def get_model():
    input_layer = Input(shape=(IMG_HEIGHT, IMG_WIDTH, N_CHANNELS,), name="input_layer", dtype='float32')
    sharp = Sharpen(num_outputs=(IMG_HEIGHT, IMG_WIDTH, N_CHANNELS,))(input_layer)
    conv_1 = Conv2D(filters=32, kernel_size=(5, 5), activation=relu, kernel_initializer=he_normal(seed=54),
                    bias_initializer=zeros(), name="convolutional_layer_1")(sharp)
    maxpool_1 = MaxPool2D(pool_size=(2, 2), name="maxpool_layer_1")(conv_1)
    dr1 = Dropout(0.25)(maxpool_1)
    flat = Flatten(name="flatten_layer")(dr1)

    d1 = Dense(units=256, activation=relu, kernel_initializer=he_normal(seed=45), bias_initializer=zeros(),
               name="dense_layer_1", kernel_regularizer=l2(0.001))(flat)
    dr2 = Dropout(0.5)(d1)

    classification = Dense(units=43, activation=None, name="classification")(dr2)
    regression = Dense(units=4, activation='linear', name="regression")(dr2)

    model = Model(inputs=input_layer, outputs=[classification, regression])
    model.summary()
    return model

from keras.utils.vis_utils import plot_model
plot_model(model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)
Image

Results CNN¶

The test results are quite good with very small number of epochs which performed really well

Metric Value
Test Precision 0.84
Test Recall 0.84
Test f1-score 0.83
Test Mean squared error 2.63

Following is the image of training loss its pretty good how it falls

Final output results¶

Image

YOLO v5¶

Code is availabe here:https://github.com/sidharthamondal/adv_maths_final_grp13/tree/main/models/yolov5

YOLOv5 is a model in the You Only Look Once (YOLO) family of computer vision models. YOLOv5 is commonly used for detecting objects. YOLOv5 comes in four main versions: small (s), medium (m), large (l), and extra large (x), each offering progressively higher accuracy rates. Each variant also takes a different amount of time to train.

Steps Performed:¶

  • Data Preprocessing
  • Training
  • Inference
Image
We used the small yolo v5 for training

Data Preprocessing:¶

It majorly involves creating custom dataset for YOLOv5 which involves creating a dataset.yamlfile, custom labels and organizing directories.

Dataset.yaml file:¶

The dataset.yaml file is a YAML configuration file that contains information about the dataset, such as the location of the training and validation data, the number of classes, and other dataset-specific settings. The file helps YOLOv5 to load and process the dataset correctly during training. You will need to specify the paths to the image and label directories, the number of classes, and other optional parameters like image augmentation settings

Image

Custom labels:¶

YOLOv5 requires custom labels to be created for each object class present in your dataset. The labels are usually text files associated with each corresponding image, and they contain information about the bounding boxes and class labels for the objects present in the image. Each line in the label file represents one object in the image and consists of the class index (integer) and the coordinates of the bounding box in the format (x_center, y_center, width, height) relative to the image size.

Organizing directories:¶

To use YOLOv5 effectively, it's essential to organize your dataset in a specific directory structure. Typically, you will have separate directories for images and labels. The images directory will contain all the training and validation images, while the labels directory will contain the corresponding label files. The labels and images should have the same name (excluding the file extension) to ensure proper pairing.

Image

YOLO v5 Results¶

YOLOv5s was trained twice for a total of 20 epochs over dataset of 20K images including the augmented images. Approximately 7.2 million parameters were trained across 157 layers where each epoch took 30 minutes to train. The accuracy was 76% with precision of 87% and mAP score of 85%

Image

YOLO v8¶

code is available here: https://github.com/sidharthamondal/adv_maths_final_grp13/tree/main/models/yolov8

State of Art model with its Nano model the most efficient and Fast model. The model exhibits excellent performance on the test data, achieving a high IoU of 0.95 and an 90% accuracy. However, when applied to real-life data, its performance falls short, indicating a lack of generalization. The primary reason behind this discrepancy lies in the data source used for training. The test data might not sufficiently encompass the diversity and complexity of real-life scenarios, leading to overfitting during training. Consequently, the model becomes highly specialized in the test data but struggles to adapt to unseen, real-world variations. Moreover, the extended training time on the dataset suggests its large size and complexity, which may contribute to prolonged convergence and potential overfitting. To address these issues, obtaining a more diverse and representative dataset, employing data augmentation techniques, and considering transfer learning approaches could lead to improved real-life performance. The Test IoU is around 95% and Test Accuracy was around 90%, Following is the model output from. Following is the code for yolo v8

Detailed code is present here in this link:

import os

from ultralytics import YOLO


# Load a model
model = YOLO("yolov8m.yaml")  # build a new model from scratch

# Use the model
results = model.train(data="google_colab_config.yaml", epochs=10)  # train the model
Image

Some of the detected images¶

Issues¶

Data Issue: Results vs. Multiple Signals: While the results obtained individually were good, the model failed to work effectively when multiple signals were provided simultaneously. Even though we had an amazing model it was not detecting multiple signals at a time the following figure demonstrates it
Image

The issue at hand required a thoughtful approach to tackle efficiently. By merging the individual images into one canvas, we found a viable solution that allowed our model to process and interpret multiple signals simultaneously. This technique significantly improved the overall efficiency and performance of the system. To achieve this, we carefully calculated the appropriate canvas size. We set the width of the canvas to accommodate the combined width of all the random signals, with a fixed distance between each signal. This arrangement helped ensure that each signal had sufficient space to be processed independently while also being close enough to be efficiently analysed together. In addition, we considered the maximum size of a signal in the dataset and set the height of the canvas accordingly. By doing so, we guaranteed that no signal would be cut off or lose any valuable information during the merging process. This approach was crucial in maintaining the integrity of each signal and preserving all relevant details for accurate analysis.

With this new canvas configuration in place, our model was able to harness the power of parallel processing, simultaneously handling multiple signals without any interference or loss of data. This advancement significantly expedited the interpretation and analysis process, enabling us to obtain results more quickly and efficiently than ever before. Overall, the decision to merge the images into one canvas, along with carefully determining the appropriate canvas size, proved to be a breakthrough in addressing the issue at hand. It not only improved the overall processing speed but also enhanced the accuracy of our model's interpretations, making it a crucial step towards achieving our goals in signal analysis and processing. The following figure 19 shows the results of our augmented data

Image

Issues Resolved¶

Training Time Issue: Extremely High Training Time: The model experienced an unprecedentedly high training time, even for a single epoch, which hindered the efficiency of the training process. However, this problem was mitigated by adopting the approach of merging images in one canvas, resulting in a significant reduction in training time.

Accuracy: Satisfactory Accuracy: While the accuracy achieved was deemed satisfactory, it did not meet the desired level of excellence. This can be attributed to the constraint of limited training opportunities. High Accuracy Objective: In order to achieve a very high accuracy, a substantial increase in training opportunities was required, which was initially challenging to attain.

Epochs: Improved Model with More Epochs: After obtaining updated data and addressing the training time issue, the model was successfully trained for 50 epochs. This significantly improved the model's performance and contributed to achieving higher accuracy levels.

Output: After resolving the issue Image

Conclusion¶

MODEL ACCURACY mAP 0.5
Yolo v5 75% 86%
CNN 85% 95%
Yolo v8 90% 95%
Yolo v8 with Data Modification 94% 98%

The summary indicates that Yolo v5 has the lowest accuracy (75%) and mAP (86%) compared to the other models. CNN performs better with an accuracy of 85% and an mAP of 95%. Yolo v8 outperforms both Yolo v5 and CNN, with an accuracy of 90% and an mAP of 95%. However, the best performing model is Yolo v8 with Data Modification, achieving the highest accuracy of 94% and an impressive mAP of 98%.

Overall, Yolo v8 with Data Modification demonstrates the best performance in terms of both accuracy and mAP, making it the most effective model for the given task.

In [10]:
import plotly.graph_objects as go

# Data from the table
model_names = ['Yolo v5', 'CNN', 'Yolo v8', 'Yolo v8 with Data Modification']
accuracies = [75, 85, 90, 94]
mAP_scores = [86, 95, 95, 98]

# Create a bar chart for accuracy
fig = go.Figure(data=[go.Bar(x=model_names, y=accuracies, text=accuracies, textposition='auto')])

# Update the layout of the chart
fig.update_layout(
    title="Model Accuracy Comparison",
    xaxis_title="Model",
    yaxis_title="Accuracy (%)",
)

# Show the chart
fig.show()

# Create a bar chart for mAP at 0.5
fig_mAP = go.Figure(data=[go.Bar(x=model_names, y=mAP_scores, text=mAP_scores, textposition='auto')])

# Update the layout of the chart
fig_mAP.update_layout(
    title="Model mAP at 0.5 Comparison",
    xaxis_title="Model",
    yaxis_title="mAP at 0.5 (%)",
)

# Show the chart
fig_mAP.show()

References¶

  • https://ultralytics.com/
  • https://www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign
  • https://colab.research.google.com/
  • https://github.com/ultralytics/yolov5
  • https://ultralytics.com/yolov5
  • https://docs.ultralytics.com/reference/engine/results/

Streamlit¶

The best model is deployed in streamlit to test out following is the link https://advmathsfinalgrp13-ejjqxnduvpmoooxnatgvjh.streamlit.app/

Next Steps¶

  • The code is working quite good in the ideal situation currently: This means that in controlled and ideal conditions, such as static images or videos with clear and well-lit scenes, the object detection application is functioning as expected. It accurately detects objects and provides satisfactory results. However, it's essential to remember that real-world scenarios are more complex and challenging, with factors like varying lighting conditions, occlusions, and dynamic environments. Therefore, thorough testing and validation in real-life conditions are necessary to assess the model's performance robustness.

  • This needs to be tested in real life and in a moving vehicle to check the model performance: While the current implementation might perform well on static images and videos, its true potential and accuracy can only be evaluated in real-life situations. Integrating the object detection system into a moving vehicle allows for real-time testing and validation under dynamic and unpredictable conditions. It can help assess how well the model handles object detection when the camera is mounted on a vehicle in motion, capturing real-world road scenes with potential obstacles, pedestrians, and other vehicles. Real-life testing also helps identify any shortcomings or areas for improvement in the model's performance.

  • It was an amazing exercise